0 Summary on Common Distributions

Name	Distribution	E	Var	MGF	Char^[1]
$Bernoulli (p)$	$P (X = 1) = p, P (X = 0) = 1 - p$	$p$	$p (1 - p)$	$1 - p + p e^{t}$	$1 - p + p e^{i t}$
$Binomial (n, p)$	$P (S_{n} = k) = (\binom{n}{k}) p^{k} (1 - p)^{n - k}$	$n p$	$n p (1 - p)$	$(1 - p + p e^{t})^{n}$	$(1 - p + p e^{i t})^{n}$
$Multinomial$
$Geometric (p)$	$P (W = k) = (1 - p)^{k - 1} p, k \in N^{*}$	$\frac{1}{p}$	$\frac{1 - p}{p^{2}}$	$\frac{p e^{t}}{1 - (1 - p) e^{t}}$	$\frac{p e^{i t}}{1 - (1 - p) e^{i t}}$
$NB (r, p)$ ^[2]	$P (F_{r} = k) = (\binom{r + k - 1}{k}) p^{r} (1 - p)^{k}$	$\frac{r (1 - p)}{p}$	$\frac{r (1 - p)}{p^{2}}$	${(\frac{p e^{t}}{1 - (1 - p) e^{t}})}^{r}$	${(\frac{p}{1 - e^{i t} + p e^{i t}})}^{r}$
$Hypergeom (N, B, n)$ ^[3]	$P (X = k) = \frac{(\binom{B}{k}) (\binom{N - B}{n - k})}{(\binom{N}{n})}$	$\frac{n B}{N}$	$n \frac{B}{N} (1 - \frac{B}{N}) (\frac{N - n}{n - 1})$
$Poisson (λ)$	$P (X = k) = \frac{λ^{k}}{k!} e^{- λ}$	$λ$	$λ$	$e^{λ (e^{t} - 1)}$	$e^{λ (e^{i t} - 1)}$
$Uniform (a, b)$	$f (x) = \frac{1}{b - a} 1_{[a, b]}$	$\frac{b + a}{2}$	$\frac{(b - a)^{2}}{12}$	$\frac{e^{t b} - e^{t a}}{t (b - a)}$	$\frac{e^{i t b} - e^{i t a}}{i t (b - a)}$
$Laplace (μ, b)$	$f (x) = \frac{1}{2 b} \exp (- \frac{∣ x - μ ∣}{b})$	$μ$	$2 b^{2}$	$\frac{e^{t μ}}{1 - b^{2} t^{2}}$	$\frac{e^{i t μ}}{1 + b^{2} t^{2}}$
$N (μ, σ^{2})$	$f (x) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{(x - μ)^{2}}{2 σ^{2}}}$	$μ$	$σ^{2}$	$e^{t μ + \frac{1}{2} σ^{2} t^{2}}$	$e^{i t μ - \frac{1}{2} σ^{2} t^{2}}$
$χ_{k}^{2}$				$(1 - 2 t)^{- \frac{k}{2}}$	$(1 - 2 i t)^{- \frac{k}{2}}$
$Gamma (α, β)$	$f (x) = \frac{β^{α}}{Γ (α)} x^{α - 1} e^{- β x}, x > 0$	$\frac{α}{β}$	$\frac{α}{β^{2}}$	${(1 - \frac{t}{β})}^{- α}$	${(1 - \frac{i t}{β})}^{- α}$
$Exp (λ)$	$λ e^{- λ x}, x \geq 0$	$\frac{1}{λ}$	$\frac{1}{λ^{2}}$	$(1 - t λ^{- 1})^{- 1}$	$(1 - i t λ^{- 1})^{- 1}$
$Beta (α, β)$	$f (x) = \frac{Γ (α, β)}{Γ (α) Γ (β)} x^{α - 1} (1 - x)^{β - 1}$	$\frac{α}{α + β}$	$\frac{α β}{(α + β)^{2} (α + β + 1)}$	$1 + \sum_{k = 1}^{\infty} (\prod_{r = 0}^{k - 1} \frac{α + r}{α + β + r}) \frac{t^{k}}{k!}$	$_{1} F_{1} (α; α + β; i t)$
$N (μ, Σ)$		$μ$	$Σ$	$e^{t^{T} (μ + \frac{1}{2} Σ_{t})}$	$e^{t^{T} (i μ - \frac{1}{2} Σ t)}$

1 Preliminary

1.1 Generalized Binomial Coefficient

For $a \in C$ and any integer $k \geq 0$ , the generalized binomial coefficient is defined as $(\begin{matrix} a \\ k \end{matrix}) = \frac{a (a - 1) (a - 2) \dots (a - k + 1)}{k!} .$
Specially, if $a = - m$ where $m \in N^{*}$ , then $(\begin{matrix} - m \\ k \end{matrix}) = (- 1)^{k} \cdot (\begin{matrix} m + k - 1 \\ k \end{matrix}) .$

1.2 Newton Binomial Theorem

Let $a \in C$ and $x \in C$ with $| x | < 1$ , then

(1 + x)^{a} = \sum_{k = 0}^{\infty} (\begin{matrix} a \\ k \end{matrix}) x^{k} .

2 Bernoulli Trial and Bernoulli distribution

A Bernoulli trial is a random experiment having 2 possible outcomes commonly labeled as success and failure. If $X$ is the indicator of success, then X follows a Bernoulli distribution or $X \sim Bernoulli (p)$ if $P [X = 1] = p, P [X = 0] = 1 - p .$

2.1 Expectation and variation of Bernoulli distribution

Assume that $X \sim Bernoulli (p)$ , therefore $\begin{aligned} E [X] = p \\ E [X^{2}] = p \\ Var [X] = E [X^{2}] - E [X]^{2} = p (1 - p) . \end{aligned}$

3 Binomial Distribution

3.1 Introduction of binomial distribution

Assume that you are testing the toys manufactured by a factory, where the probability that a toy is defective is $p$ . In order to decide whether to accept the toys, you randomly sample $n$ of them from the whole batch. Let $X$ denotes the number of broken toys ( $0 \leq X \leq n$ ). Then $X$ follows a binomial distribution with parameters $n$ and $p$ $X \sim Bin (n, p) .$ The pmf of $X$ is $P [X = k] = (\begin{matrix} n \\ k \end{matrix}) p^{k} (1 - p)^{n - k} k \in {0, 1, \dots, n} .$

Tip

We can consider Binomial distribution as $n$ independent Bernoulli trials and the random variable $X \sim Bin (n, p)$ denotes the number of successes in $n$ independent Bernoulli trials: $X = ξ_{1} + ξ_{2} + \dots + ξ_{n}, ξ_{i} \sim Bernoulli (p) .$

3.2 Expectation and Variation of binomial distribution

We can directly compute the expectation and variation by calculating the first and second moment of $X$ . $\begin{aligned} E [X] & = \sum_{k = 0}^{n} k \cdot (\begin{array}{c} n \\ K \end{array}) p^{k} (1 - p)^{n - k} \\ = \sum_{k = 1}^{n} k \cdot (\begin{array}{c} n \\ K \end{array}) p^{k} (1 - p)^{n - k} \\ = n p \sum_{k = 1}^{n} (\begin{array}{c} n - 1 \\ K - 1 \end{array}) p^{k - 1} (1 - p)^{(n - 1) - (k - 1)} \\ = n p \sum_{t = 0}^{n - 1} (\begin{array}{c} n - 1 \\ T \end{array}) p^{t} (1 - p)^{n - 1 - t} = n p . \end{aligned}$

\begin{aligned} E [X^{2}] & = \sum_{k = 0}^{n} k^{2} (\begin{array}{c} n \\ K \end{array}) p^{k} (1 - p)^{n - k} \\ = \sum_{k = 1}^{n} k (k - 1) (\begin{array}{c} n \\ K \end{array}) p^{k} (1 - p)^{n - k} + \sum_{k = 1}^{n} k (\begin{array}{c} n \\ K \end{array}) p^{k} (1 - p)^{n - k} \\ = \sum_{k = 2}^{n} k (k - 1) (\begin{array}{c} n \\ K \end{array}) p^{k} (1 - p)^{n - k} + E [X] \\ = p^{2} n (n - 1) \sum_{k = 2}^{n} (\begin{array}{c} n - 2 \\ K - 2 \end{array}) p^{k - 2} (1 - p)^{(n - 2) - (k - 2)} + E [X] \\ = p^{2} n (n - 1) \sum_{t = 0}^{n - 2} (\begin{array}{c} n - 2 \\ T \end{array}) p^{k - 2} (1 - p)^{k - 2 - t} + E [X] \\ = p^{2} n (n - 1) + n p . \end{aligned}

therefore $\begin{aligned} E [X] = n p \\ Var [X] = E [X^{2}] - E [X]^{2} = n p (1 - p) . \end{aligned}$
2. Let $ξ_{i}$ denote the indicator variable of success of the $i$ th Bernoulli trial. Then the number of successes can be expressed as $X = ξ_{1} + \dots + ξ_{n}$ . Using the expectation and variation of Bernoulli distribution and linearity of expectation and variation for independent random variables. We get $\begin{aligned} E [X] = E [ξ_{1} + \dots + ξ_{n}] = \sum_{i = 1}^{n} E [ξ_{i}] = n p \\ Var [X] = Var [ξ_{1} + \dots + ξ_{n}] = \sum_{i = 1}^{n} Var [ξ_{i}] = n p (1 - p) . \end{aligned}$

4 Geometric Distribution

4.1 Pmf of Geometric Distribution

The geometric distribution gives the probability that the first occurrence of success requires $k$ independent trials, each with probability $p$ . Assume a random variable $X \sim Geo (p)$ , then $P [X = k] = (1 - p)^{k - 1} p, k = 1, 2, 3, \dots .$

4.2 Expectation of Geometric Distribution

Here we use an interesting trick to solve $E [X]$ where $X \sim Geo (p)$ . $\begin{aligned} E [X] & = \sum_{k = 1}^{\infty} k \cdot (1 - p)^{k - 1} p \\ = p \sum_{k = 1}^{\infty} [(q)^{k}]^{'} \\ = p \cdot {(\frac{q}{1 - q})}^{'} \\ = \frac{1}{p} . \end{aligned}$

4.3 Variation of Geometric Distribution

4.4 Sum of Geometric-distributed random variables

Assume we have $n$ random variables $X_{1}, \dots, X_{n}$ satisfying $X_{i} \sim Geo (p)$ , $i = 1, \dots, n$ . From the definition of Geometric distribution we know that it is about the number of trials when the first success occurs. Then $Y = X_{1} + X_{2} + \dots + X_{n}$ indicates the number of trials when the $n th$ success occurs. Therefore the last trial has succeeded and the remaining trials have $n - 1$ successes. Thus $\begin{aligned} P [Y = k] & = p \cdot (\begin{array}{c} k - 1 \\ n - 1 \end{array}) p^{n - 1} (1 - p)^{k - n} \\ = (\begin{array}{c} k - 1 \\ n - 1 \end{array}) p^{n} (1 - p)^{k - n} . \end{aligned}$ We call this distribution Negative Binomial Distribution.

5 Pascal Distribution

In $n$ independent Bernoulli trials with success probability $p$ , the pascal distribution focuses on the number of failures when the $r$ th success happens. If a random variable $X \sim Pascal (r, p)$ , then $P [X = k] = (\begin{matrix} k + r - 1 \\ k \end{matrix}) p^{r} (1 - p)^{k}, k = 0, 1, 2, \dots .$

5.1 Expectation and variation of Pascal distribution

If $X \sim Pascal (r, p)$ , then $E [X] = \frac{r (1 - p)}{p}, Var [X] = \frac{r (1 - p)}{p^{2}} .$

Proof of expectation and variation using geometric distribution

If we define the geometric distributed random variable $X$ as the number of failures before the first success happens in Bernoulli trials (since the original version is defined as the number of Bernoulli trials, we denote this the second version). Then $P [X = k] = (1 - p)^{k} p .$
We can find that starting from the $(i - 1)$ th success to the $i$ th success, the number of failure follows the second version of geometric distribution. Therefore $X \sim Pascal (r, p)$ satisfies $X = X_{1} + X_{2} + \dots + X_{r}, X_{i} \sim {Geo}^{II} (p) .$
For this version of geometric distribution, we have $E [X_{i}] = \frac{1 - p}{p}, Var [X_{i}] = \frac{1 - p}{p^{2}} i \in [r] .$
Using the property of linear combination of random variables we get $\begin{aligned} E [X] & = \sum_{i = 1}^{r} E [X_{i}] = \frac{r (1 - p)}{p}, \\ Var [X] & = \sum_{i = 1}^{r} Var [X_{i}] = \frac{r (1 - p)}{p^{2}} . \end{aligned}$

Direct proof of expectation and variation using generalized binomial coefficient

6 Negative Binomial Distribution

In $n$ independent Bernoulli trials with success probability $p$ , the negative binomial distribution focuses on the number of trials when the $r$ th success happens. If a random variable $X \sim NB (r, p)$ , then $P [X = k] = (\begin{matrix} k - 1 \\ r - 1 \end{matrix}) p^{r} (1 - p)^{k - r}, k = r, r + 1, \dots .$ From the above discussion about geometric distribution, we have $X = X_{1} + X_{2} + \dots X_{r}$ where $X_{i} \sim Geo (p) (i \in [r])$ , where $X_{i}$ denotes the number of trials needed after the $(i - 1)$ st success to obtain the $i$ th success. ( $X_{1}, \dots, X_{r}$ are independent)

6.1 Expectation and variation of negative binomial distribution

\begin{array}{r} E [X] = \sum_{k = r}^{\infty} k \cdot (\begin{array}{c} k - 1 \\ R - 1 \end{array}) p^{r} (1 - p)^{k - r} \end{array}

7 Gaussian Distribution

7.1 Pdf of Gaussian Distribution

If a random variable $X \sim N (μ, σ^{2})$ , where $μ$ is the mean and $σ^{2}$ is the variance, then the pdf of $X$ is $f (x) = \frac{1}{\sqrt{2 π} σ} \exp {- \frac{(x - μ)^{2}}{2 σ^{2}}} .$

7.2 Mgf of Gaussian Distribution

Assume a random variable $X \sim N (μ, σ^{2})$ , then the mgf of $X$ is $M_{X} (t) = E [e^{t X}] = \exp (μ t + \frac{1}{2} σ^{2} t^{2}) .$

Proof of the mgf of Gaussian variable

By the definition of expectation we have $\begin{aligned} E [e^{t X}] & = \int_{- \infty}^{\infty} e^{x t} \frac{1}{\sqrt{2 π} σ} \exp {- \frac{(x - μ)^{2}}{2 σ^{2}}} d x \\ = \exp {- \frac{μ^{2} - (μ + σ^{2} t)^{2}}{2 σ^{2}}} \cdot \int_{- \infty}^{\infty} \frac{1}{\sqrt{2 π} σ} \exp {- \frac{(x - μ σ^{2} t)^{2}}{2 σ^{2}}} d x \\ = \exp {μ t + \frac{1}{2} σ^{2} t^{2}} . \end{aligned}$

7.3 Characteristic function of Gaussian Variables

Since we know the mgf of Gaussian variable $X \sim N (μ, σ)$ is $M_{X} (t) = E [e^{t X}] = \exp (μ t + \frac{1}{2} σ^{2} t^{2}),$ therefore the characteristic function of $X$ is equal to $φ_{X} (t) = E [e^{i t X}] = M_{X} (i t) = \exp (i μ t - \frac{1}{2} σ^{2} t^{2}) .$

7.4 Moments of standard Gaussian

Assume that the random variable $X \sim N (0, 1)$ and that $k \in N^{*}$ , then $E [X^{k}] = {\begin{cases} 0 & k odd \\ (k - 1)!! & k even . \end{cases}$
First we state an important lemma:

Relation between Moments and derivatives of mgf

$M_{X}^{(n)} (0) = E (X^{n})$

Proof

The Taylor's expansion of $e^{t x}$ is $e^{t x} = 1 + t x + \frac{(t x)^{2}}{2!} + \frac{(t x)^{3}}{3!} + \dots .$ Inserting this expansion into the definition of the mgf yields: $\begin{aligned} M_{X} (t) & = \int_{- \infty}^{\infty} e^{t x} f_{X} (x) d x \\ = \int_{- \infty}^{\infty} (1 + t x + \frac{(t x)^{2}}{2!} + \frac{(t x)^{3}}{3!} + \dots) f_{X} (x) d x \\ = \int_{- \infty}^{\infty} f_{X} (x) d x + t \int_{- \infty}^{\infty} x f_{X} (x) d x + \frac{1}{2!} t^{2} \int_{- \infty}^{\infty} x^{2} f_{X} (x) d x + \dots \\ = 1 + t E (X) + \frac{t^{2}}{2!} E (X^{2}) + \frac{t^{3}}{3!} E (X^{3}) + \dots . \end{aligned}$ Therefore, by differentiating term by term, we obtain $M_{X}^{(n)} (0) = E [X^{n}]$ .

Since we have the mgf of Gaussian variables, moments can be obtained via differentiation. We also provide a direct proof for the moments of the standard Gaussian.

Direct proof of moments of standard Gaussian

$\begin{aligned} E [X^{k}] & = \int_{- \infty}^{\infty} x^{k} \cdot \frac{1}{\sqrt{2 π}} e^{- \frac{x^{2}}{2}} d x \\ = \frac{1}{\sqrt{2 π}} \int_{- \infty}^{\infty} x^{k} e^{- \frac{x^{2}}{2}} d x . \end{aligned}$

If $k$ is odd, then it's obvious that $E [X^{k}]$ =0, so the following process assumes that $k$ is even. Then $\begin{aligned} \frac{1}{\sqrt{2 π}} \int_{- \infty}^{\infty} x^{k} e^{- \frac{x^{2}}{2}} d x & = \sqrt{\frac{2}{π}} \int_{0}^{\infty} x^{k} e^{- \frac{x^{2}}{2}} d x \\ = \sqrt{\frac{2}{π}} \int_{0}^{\infty} t^{\frac{k - 1}{2}} e^{- \frac{t}{2}} d t \\ = \sqrt{\frac{2}{π}} \cdot \frac{1}{2} \cdot \frac{Γ (\frac{k + 1}{2})}{{(\frac{1}{2})}^{(k + 1) / 2}} \\ = \frac{1}{\sqrt{2 π}} 2^{\frac{k + 1}{2}} \cdot \frac{(k - 1)!!}{2^{\frac{k}{2}}} \cdot Γ (\frac{1}{2}) \\ = (k - 1)!! (Recall that Γ (\frac{1}{2}) = \sqrt{π}) . \end{aligned}$

7.5 An important property of Standard Gaussian and Mills Ratio

A standard Gaussian variable $X \sim N (0, 1)$ has a good property in terms of derivatives that $ϕ^{'} (x) = - x ϕ (x) .$

7.6 Expectation of absolute value of Gaussian variables

If a random variable $Z \sim N (0, 1)$ , $E [| Z |] = \sqrt{\frac{2}{π}} .$

Proof

$\begin{aligned} E [| Z |] = \int_{- \infty}^{\infty} | z | \frac{1}{\sqrt{2 π}} e^{- \frac{z^{2}}{2}} d z & = \frac{2}{\sqrt{2 π}} \int_{0}^{\infty} z e^{- \frac{z^{2}}{2}} d z \\ = \frac{1}{2} \sqrt{\frac{2}{π}} \int_{0}^{\infty} e^{- \frac{z^{2}}{2}} d z^{2} \\ = \sqrt{\frac{1}{2 π}} \int_{0}^{\infty} e^{- \frac{x}{2}} d x \\ = \sqrt{\frac{1}{2 π}} (- 2) e^{- \frac{x}{2}} |_{0}^{\infty} \\ = \sqrt{\frac{2}{π}} . \end{aligned}$

7.7 Rotational Invariance of Gaussian Variables

If a matrix $R \in R^{n \times n}$ is orthogonal, indicating that $R R^{⊤} = R^{⊤} R = I_{n}$ , given a random vector $X \sim N (0, σ^{2} I_{n})$ , it satisfies $R X \overset{d}{=} Y .$ This can be proved using linear transformation of multivariate Gaussian variables.

8 Multivariate Gaussian Distribution

8.1 Definition of Standard Normal Random Vector

A real random vector $X = (X_{1}, \dots, X_{k})^{⊤}$ is called a standard normal random vector if all of its components $X_{i}$ ( $i \in [k]$ ) are independent standard Gaussian variables. We denote it $X \sim N (0, I_{n})$ where the mean vector is $\vec{0}$ and the covariance matrix is $I_{n}$ .

8.2 Definition of Normal random vector

A real random vector $X = (X_{1}, \dots, X_{k})^{⊤}$ is called a normal random vector if there exists a random normal random vector $Z \in R^{l}$ , a $k \times l$ matrix $A$ and a $k$ -dim vector $μ$ , such that $X = A Z + μ$ . (Wikipedia) We denote it $X \sim N (μ, Σ)$ , where $μ$ is the mean vector and $Σ$ is the covariance matrix with $μ = (\begin{matrix} μ_{1} \\ μ_{2} \\ ⋮ \\ μ_{k} \end{matrix}), Σ = (\begin{matrix} σ_{1}^{2} & ρ_{12} σ_{1} σ_{2} & \dots & ρ_{1 k} σ_{1} σ_{k} \\ ρ_{21} σ_{2} σ_{1} & σ_{2}^{2} & \dots & ρ_{2 k} σ_{2} σ_{k} \\ ⋮ & ⋮ & \dots & ⋮ \\ ρ_{k 1} σ_{k} σ_{1} & ρ_{k 2} σ_{k} σ_{2} & \dots & σ_{k}^{2} \end{matrix}),$ where $Σ_{i j}$ is the covariance between $X_{i}$ and $X_{j}$ ( $i, j \in [k]$ ) and $μ_{i}$ is the mean of $X_{i}$ ( $i \in [k]$ ).

8.3 Joint pdf of multivariate Gaussian distribution

Assume that $X \in R^{k} \sim N (μ, Σ)$ , then $f_{X} (x_{1}, \dots, x_{k}) = \frac{1}{\sqrt{(2 π)^{k} | Σ |}} \exp {- \frac{1}{2} (X - μ)^{⊤} Σ^{- 1} (X - μ)} .$

8.4 Characteristic function of multivariate Gaussian variables

If $X \in R^{k} \sim N (μ, Σ)$ , then $φ_{X} (t) = E [e^{i t^{⊤} X}] = \exp {i t^{⊤} μ - \frac{1}{2} t^{⊤} Σ t}, t \in R^{k} .$

8.5 Linear Transformation of multivariate Gaussian variables

If $X \sim N (μ, Σ)$ and $Y = α + A X$ , then $\begin{aligned} E [Y] & = A μ + α \\ Var [Y] & = A Σ A^{⊤} . \end{aligned}$

9 Chi-squared Distribution

9.1 Definition of Chi-squared Distribution

Assume that we have $n$ $i.i.d.$ samples $X_{1}, \dots, X_{n}$ from $N (0, 1)$ . Then $\sum_{i = 1}^{n} X_{i}^{2} \sim χ_{n}^{2},$ which is called Chi-squared Distribution with $n$ degrees of freedom.

9.2 Pdf of Chi-squared Distribution

From the first part, if $X \sim N (0, 1)$ , then $Y = X^{2} \sim χ_{1}^{2}$ . Then $\begin{aligned} F_{Y} (y) = P [X^{2} \leq y] & = P [- \sqrt{y} \leq X \leq \sqrt{y}] \\ = \int_{- \sqrt{y}}^{\sqrt{y}} \frac{1}{\sqrt{2 π}} e^{- \frac{t^{2}}{2}} d t . \end{aligned}$ Let $G (t)$ denote the primitive function of $\frac{1}{\sqrt{2 π}} \exp {- \frac{t^{2}}{2}}$ . Therefore $\begin{aligned} f_{Y} (y) = F_{Y}^{'} (y) & = (G (\sqrt{y}) - G (- \sqrt{y}))^{'} \\ = \frac{1}{\sqrt{2 π}} e^{- \frac{y}{2}} \frac{1}{\sqrt{y}} \\ (1) & = \frac{{(\frac{1}{2})}^{1 / 2}}{Γ (\frac{1}{2})} y^{\frac{1}{2} - 1} e^{- \frac{1}{2} y} \end{aligned}$ where $(1)$ is because $Γ (\frac{1}{2}) = \sqrt{π}$ .

pdf of

χ_{1}^{2}

$f_{Y} (y) = \frac{1}{\sqrt{2 π}} e^{- \frac{y}{2}} \frac{1}{\sqrt{y}}, y \geq 0$

9.3 Relation to Gamma Distribution

In the derivation of the pdf of Chi-squared Distribution we can conclude that $χ_{1}^{2} \Leftrightarrow Γ (\frac{1}{2}, \frac{1}{2}),$ which is a special case of Gamma distribution. Therefore, if a random variable $X \sim χ_{n}^{2}$ , then $X \sim Γ (\frac{n}{2}, \frac{1}{2})$ .

9.4 Expectation of Chi-squared Distribution

Given that if a random variable $Y \sim Γ (α, β)$ , then $E [Y] = \frac{α}{β}$ , therefore for a random variable $X \sim χ_{n}^{2}$ , $E [X] = n .$

9.5 Variation of Chi-squared Distribution

Given that if a random variable $Y \sim Γ (α, β)$ , then $Var [Y] = \frac{α}{β^{2}}$ , therefore for a random variable $X \sim χ_{n}^{2}$ , $Var [X] = 2 n .$

9.6 Mgf of Chi-squared Distribution

Given that for a Gamma distributed random variable $X \sim Γ (α, β)$ , the mgf is $M_{X} (t) = \frac{1}{(1 - \frac{t}{β})^{α}},$ since $Y \sim χ_{n}^{2}$ is equivalent to $Y \sim Γ (\frac{n}{2}, \frac{1}{2})$ , therefore $M_{Y} (t) = \frac{1}{(1 - 2 t)^{\frac{n}{2}}} .$

9.7 Characteristic function of Chi-squared Distribution

Since for $Y \sim χ_{n}^{2}$ , $M_{Y} (t) = (1 - 2 t)^{- \frac{n}{2}}$ , therefore $φ_{Y} (t) = M_{Y} (i t) = (1 - 2 i t)^{- \frac{n}{2}} .$

9.8 Asymptotic Property of Chi-squared Distribution

9.8.1 LLN

From the definition of Chi-squared Distribution we have $Y = \sum_{i = 1}^{n} X_{i}^{2} \sim χ_{n}^{2}, X_{i} \sim N (0, 1) .$ If we treat every $X_{i}^{2}$ as an $i.i.d.$ sample from the same distribution with mean $E [X_{i}^{2}] = 1$ . By the law of large number we can derive $\frac{\sum_{i = 1}^{n} X_{i}^{2}}{n} \overset{P}{\to} E [X_{i}^{2}] = 1,$ thus $\frac{Y}{n} \overset{P}{\to} 1, Y \sim χ_{n}^{2} .$

9.8.2 CLT

Similar to the LLN part, since we treated every $X_{i}^{2}$ as an $i.i.d.$ sample, therefore we can apply Central Limit Theorem to get convergence in distribution: $\frac{Y - n}{\sqrt{2 n}} \overset{L}{\to} N (0, 1), Y \sim χ_{n}^{2} .$

10 Exponential Distribution

10.1 Pdf of Exponential Distribution

If a random variable $X \sim Exp (λ)$ , then the pdf of $X$ is $f (x) = {\begin{cases} λ e^{- λ x} & x > 0 \\ 0 & elsewhere . \end{cases}$

10.2 The tail probability of Exponential Distribution

If $X \sim Exp (λ)$ with $λ > 0$ , then the tail probability is $P [X \geq t] = e^{- λ t} .$

10.3 Memoryless Property

Memoryless property of Exponential distributed variable

$P [X > s + t ∣ X > s] = P [X > t] .$

Since $\begin{aligned} P [X > s + t ∣ X > s] & = \frac{P [X > s + t]}{P [X > s]} = e^{- λ (s + t)} / e^{- λ s} = e^{- λ t} . \end{aligned}$
Changing the expression $P [X > s + t ∣ X > s]$ to $P [X - s > t ∣ X > s]$ we can conclude that $X - s ∣ X > s$ and $X$ have the same distribution.

10.4 Relation to Gamma Distribution

Exponential Distribution is a special form of Gamma Distribution, using shape-rate version we can find that if $X \sim Γ (1, α)$ : $f (x) = \frac{α}{Γ (1)} x^{1 - 1} e^{- α x} = α e^{- α x}, x > 0.$ Therefore $X \sim Exp (λ) \Leftrightarrow X \sim Γ (1, λ) .$

11 Poisson Distribution

11.1 Pmf of Poisson Distribution

Assume random variable $X \sim Poisson (λ)$ , the pmf of $X$ is given as follows: $P [X = k] = \frac{e^{- λ} λ^{k}}{k!} λ > 0, k = 0, 1, 2, \dots .$

11.2 Expectation and variance of Poisson Distribution

First we have an important property of Poisson Distribution: $Assume X \sim Poisson (λ), then E [X (X - 1) \dots (X - k)] = λ^{k + 1}, k = 0, 1, 2, \dots$ Using this property we can quickly get that $\begin{aligned} E [X] & = λ \\ E [X^{2}] & = λ^{2} + λ, \end{aligned}$ therefore $Var [X] = E [X^{2}] - E [X]^{2} = λ .$

11.3 Mgf of Poisson Distribution

If a random variable $X \sim Poisson (λ)$ where $λ > 0$ , then the mgf of $X$ is $M_{X} (t) = \exp {λ (e^{t} - 1)} .$

Proof

By the definition of moment generating function, we have $\begin{aligned} M_{X} (t) = E [e^{t X}] & = \sum_{k = 0}^{\infty} e^{t k} \cdot \frac{e^{- λ} λ^{k}}{k!} \\ = \sum_{k = 0}^{\infty} \frac{e^{- λ} (e^{t} λ)^{k}}{k!} \\ = e^{- λ} \exp {e^{t} λ} \\ = \exp {λ (e^{t} - 1)} . \end{aligned}$

11.4 Reproducibility of Poisson distribution

If $X_{1}, \dots, X_{n}$ are independent variables satisfying $X_{i} \sim Poisson (λ_{i})$ , then $\sum_{i = 1}^{n} X_{i} \sim Poisson (\sum_{i = 1}^{n} λ_{i}) .$

Proof: This is proved by using mgf

11.5 Poisson approximation to the Binomial distribution

For $n$ Bernoulli trials with probability $p$ , if $n$ is large enough and $p$ is small enough, then the Binomial distribution is approximately equal to Poisson distribution with parameter $λ = n p$ .

Proof

If a random variable $X \sim Bin (n, p)$ , then $\begin{aligned} P [X = k] & = (\begin{array}{c} n \\ K \end{array}) p^{k} (1 - p)^{n - k} \\ = (\begin{array}{c} n \\ K \end{array}) {(\frac{λ}{n})}^{k} {(1 - \frac{λ}{n})}^{n - k} \\ = \frac{λ^{k}}{k!} \cdot \frac{n (n - 1) \dots (n - k + 1)}{n^{k}} \cdot {(1 - \frac{λ}{n})}^{n - k} \\ = \frac{λ^{k}}{k!} \cdot (1 - \frac{1}{n}) (1 - \frac{2}{n}) \dots (1 - \frac{k - 1}{n}) \cdot {[{(1 - \frac{λ}{n})}^{- \frac{n}{λ}}]}^{- λ} \cdot {(1 - \frac{λ}{n})}^{- k} . \end{aligned}$
It's trivial that $lim_{n \to \infty} P [X = k] = \frac{e^{- λ} λ^{k}}{k!},$
Which is the pmf of Poisson distribution.

12 Gamma Distribution

12.1 $Γ$ function

The Gamma function $Γ (\cdot)$ is defined as follows:

Γ (α) = \int_{0}^{\infty} y^{α - 1} e^{- y} d y,

An integration by parts shows that $Γ (α) = (α - 1) \int_{0}^{\infty} y^{α - 2} e^{- y} d y = (α - 1) Γ (α - 1)$ Given that $Γ (1) = \int_{0}^{\infty} e^{- y} d y = 1$ Thus if $α$ is a positive integer greater than 1, $Γ (α) = (α - 1)!$

Recursive property of Gamma function

For all $x > 0$ , the Gamma function satisfies the following recursion: $Γ (x + 1) = x Γ (x), x > 0.$

12.2 $Γ (α, β)$ Distribution (shape-scale version)

We say that the continuous random variable $X$ has a $Γ - distribution$ with parameters $α > 0$ and $β > 0$ if its pdf is $f (x) = {\begin{cases} \frac{1}{Γ (α) β^{α}} x^{α - 1} e^{- x / β} & 0 < x < \infty \\ 0 & elsewhere, \end{cases}$ we often write that $Γ (α, β)$ distribution where $α$ is the shape parameter and $β$ is the scale parameter.

12.3 $Γ (α, β)$ Distribution (shape-rate version)

We say that the continuous random variable $X$ has a $Γ - distribution$ with parameters $α > 0$ and $β > 0$ if its pdf is $f (x) = {\begin{cases} \frac{β^{α}}{Γ (α)} x^{α - 1} e^{- β x} & 0 < x < \infty \\ 0 & elsewhere, \end{cases}$ we often write that $Γ (α, β)$ distribution where $α$ is the shape parameter and $β$ is the rate parameter. Throughout this article we use this version of Gamma distribution.

12.4 Expectation of $Γ (α, β)$ Distribution

We can use the definition of the Gamma function to simplify the computation of the integral: $\begin{aligned} E [X] & = \int_{0}^{\infty} \frac{β^{α}}{Γ (α)} x^{α - 1} e^{- β x} \cdot x d x \\ = \int_{0}^{\infty} \frac{β^{α}}{Γ (α)} x^{α} e^{- β x} d x \\ = \frac{β^{α}}{Γ (α)} \cdot \frac{Γ (α + 1)}{β^{α + 1}} \int_{0}^{\infty} \frac{β^{α + 1}}{Γ (α + 1)} x^{α} e^{- β x} d x \\ = \frac{α}{β} . \end{aligned}$

12.5 Variation of $Γ (α, β)$ Distribution

Similarly, we can use the definition of the Gamma function to simplify the computation of the integral: $\begin{aligned} E [X^{2}] & = \int_{0}^{\infty} \frac{β^{α}}{Γ (α)} x^{α - 1} e^{- β x} \cdot x^{2} d x \\ = \int_{0}^{\infty} \frac{β^{α}}{Γ (α)} x^{α + 1} e^{- β x} d x \\ = \frac{β^{α}}{Γ (α)} \cdot \frac{Γ (α + 2)}{β^{α + 2}} \int_{0}^{\infty} \frac{β^{α + 2}}{Γ (α + 2)} x^{α + 1} e^{- β x} d x \\ = \frac{α (α + 1)}{β^{2}} . \end{aligned}$ Therefore $Var [X^{2}] = E [X^{2}] - (E [X])^{2} = \frac{α}{β^{2}} .$

12.6 Mgf of $Γ (α, β)$ distribution

\begin{aligned} M_{X} (t) & = \int_{0}^{\infty} e^{t x} \cdot \frac{β^{α}}{Γ (α)} x^{α - 1} e^{- β x} d x \\ = \int_{0}^{\infty} \frac{β^{α}}{Γ (α)} x^{α - 1} e^{- (β - t) x} d x \\ = {(\frac{1}{1 - \frac{t}{β}})}^{α} . \end{aligned}

12.6.1 Additivity property of the Gamma Distribution

If $X \sim Γ (α_{1}, θ)$ and $Y \sim Γ (α_{2}, θ)$ are independent, then $X + Y \sim Γ (α_{1} + α_{2}, θ) .$

This can be proved using mgf of Gamma distribution.

13 Inverse Gamma Distribution

13.1 Pdf of Inverse Gamma Distribution

If a random variable $Y \sim I G (α, β)$ , then $f (y) = \frac{β^{α}}{Γ (α)} y^{- (α + 1)} e^{- β / y}, y > 0.$

14 Beta Distribution

14.1 Beta Function

Beta function is defined by integral $B (r_{1}, r_{2}) = \int_{0}^{1} t^{r_{1} - 1} (1 - t)^{r_{2} - 1} d t .$

Association between Beta function and Gamma function

$B (α, β) = \frac{Γ (α) Γ (β)}{Γ (α + β)} .$

Proof

$\begin{aligned} Γ (α) Γ (β) & = \int_{0}^{\infty} x^{α - 1} e^{- x} d x \cdot \int_{0}^{\infty} y^{β - 1} e^{- y} d y \\ = \int_{0}^{\infty} \int_{0}^{\infty} e^{- (x + y)} x^{α - 1} y^{β - 1} d x d y . \end{aligned}$

Apply the variable substitution by letting $x = s t$ , $y = s (1 - t)$ , then the above integral is equivalent to $\begin{aligned} \int_{0}^{\infty} \int_{0}^{1} e^{- s} (s t)^{α - 1} (s (1 - t))^{β - 1} s d t d s \\ = \int_{0}^{\infty} e^{- s} s^{α + β - 1} d s \cdot \int_{0}^{1} t^{α - 1} (1 - t)^{β - 1} d t \\ = Γ (α + β) \cdot B (α, β), \end{aligned}$ thus we have finished the proof.

Another expression of Beta function

We can get another version of Beta function using variable substitution $t = \frac{x}{1 + x}, x \in (0, \infty)$ . Then the integral is equivalent to $\begin{aligned} B (r_{1}, r_{2}) & = \int_{0}^{\infty} {(\frac{x}{1 + x})}^{r_{1} - 1} {(\frac{1}{1 + x})}^{r_{2} - 1} \frac{1}{(1 + x)^{2}} d x \\ = \int_{0}^{\infty} x^{r_{1} - 1} {(\frac{1}{1 + x})}^{r_{1} + r_{2}} d x \\ = \int_{0}^{\infty} x^{r_{1} - 1} (1 + x)^{- r_{1} - r_{2}} d x . \end{aligned}$

14.2 Pdf of beta distribution

The beta distribution $b e t a (α, β)$ is a two-parameter distribution with range $[0, 1]$ and pdf $\begin{aligned} f (x) & = \frac{(α + β - 1)!}{(α - 1)! (β - 1)!} x^{α - 1} (1 - x)^{β - 1} (α > 0, β > 0) \\ = \frac{1}{B (α, β)} x^{α - 1} (1 - x)^{β - 1} . \end{aligned}$

14.3 Expectation of beta distribution

We can use the pdf of beta distribution to get the expectation of a beta distribution random variable easily, as the computation that follows: $\begin{aligned} E [X] & = \int_{0}^{1} \frac{1}{B (α, β)} x^{α - 1} (1 - x)^{β - 1} \cdot x d x \\ = \int_{0}^{1} \frac{1}{B (α, β)} x^{α} (1 - x)^{β - 1} d x \\ = \frac{B (α + 1, β)}{B (α, β)} \int_{0}^{1} \frac{1}{B (α + 1, β)} x^{α} (1 - x)^{β - 1} d x \\ = \frac{B (α + 1, β)}{B (α, β)} \\ = \frac{α}{α + β} . \end{aligned}$

14.4 Variation of beta distribution

Similarly, we can use the pdf of beta distribution to derive $E [X^{2}]$ , using $V a r [X] = E [X^{2}] - E [X]^{2}$ we can get the variation. $E [X^{2}] = \frac{B (α + 2, β)}{B (α, β)},$ $\begin{aligned} Var [X] = E [X^{2}] - E [X]^{2} & = \frac{B (α + 2, β)}{B (α, β)} - {(\frac{B (α + 1, β)}{B (α, β)})}^{2} \\ = \frac{α β}{(α + β)^{2} (α + β + 1)} . \end{aligned}$

14.5 Relation to Chi-squared Distribution

Let $X$ and $Y$ are independent variables and satisfy $X \sim χ_{n}^{2}, Y \sim χ_{m}^{2}$ . Then we have $\frac{X}{X + Y} \sim B (\frac{n}{2}, \frac{m}{2}) .$

Proof: Main idea is variable substitution

15 Cauchy Distribution

15.1 Pdf of Cauchy Distribution

A random variable $X$ is said to follow a Cauchy Distribution with location parameter $θ \in R$ and scale parameter $γ > 0$ if $f (x; θ, γ) = \frac{1}{π γ [1 + {(\frac{x - θ}{γ})}^{2}]}, x \in R .$ Therefore $X \sim Cauchy (θ, γ)$ .

A special property of Cauchy Distribution

The expectation of Cauchy Distribution doesn't exist.

15.2 Median of Cauchy Distribution

15.3 Characteristic function of Cauchy distribution

16 T Distribution

16.1 Definition

Assume that $X \sim N (0, 1)$ and $Y \sim χ_{n}^{2}$ are independent. Then the statistic

T = \frac{X}{\sqrt{Y / n}}

Is said to follow a $t$ -distribution with $n$ degrees of freedom.

16.2 Pdf of t-distribution

For a random variable $T \sim t_{n}$ , the density function of $T$ is given by $f_{T} (t) = \frac{Γ (\frac{n + 1}{2})}{Γ (\frac{n}{2}) \sqrt{π n}} {(\frac{t^{2}}{n} + 1)}^{- \frac{n + 1}{2}} .$

This can be proved using variable substitution

17 F Distribution

17.1 Definition

Assume that $X \sim χ_{n}^{2}$ and $Y \sim χ_{m}^{2}$ are independent. Then the statistic $F = \frac{X / n}{Y / m} \sim F_{n, m}$ is said to follow a $F$ -distribution with parameters $n$ and $m$ .

17.2 Pdf of F-distribution

From the definition we have $F = \frac{X / n}{Y / m}$ , this is an example of the probability density function of the ratio of two Chi-squared distributed variables. The pdf of $F \sim F_{n, m}$ is given bellow: $f_{F} (f) = \frac{Γ (\frac{m + n}{2})}{Γ (\frac{m}{2}) Γ (\frac{n}{2})} {(\frac{n}{m})}^{\frac{n}{2}} f^{\frac{n}{2} - 1} {(\frac{n}{m} f + 1)}^{- \frac{m + n}{2}} .$

Proof: Mainly using substitution of variables

We use the common technique which is variable substitutions to change the variable: ${\begin{cases} F = \frac{X / n}{Y / m} \\ W = Y . \end{cases}$

17.3 Property of F-distribution

F_{n, m} (1 - α) = \frac{1}{F_{m, n} (α)} .

Proof

$\begin{aligned} P [\frac{X / n}{Y / m} \geq F_{n, m} (1 - α)] = 1 - α \\ \Rightarrow P [\frac{X / n}{Y / m} < F_{n, m} (1 - α)] = α \\ \Rightarrow P [\frac{Y / m}{X / n} > \frac{1}{F_{n, m} (1 - α)}] = α, \end{aligned}$

Thus we have finished the proof.

17.4 Expectation and Variation of $F$ -distribution

If a random variable $X \sim F_{n, m}$ , then $E [X] = \frac{m}{m - 2}, Var [X] = \frac{2 m^{2} (n + m - 2)}{n (m - 2)^{2} (m - 4)} .$

Proof of expectation

Given $X \sim F_{n, m}$ , then the pdf of $X$ is given by $f_{X} (x) = \frac{Γ (\frac{m + n}{2})}{Γ (\frac{m}{2}) Γ (\frac{n}{2})} {(\frac{n}{m})}^{\frac{n}{2}} x^{\frac{n}{2} - 1} {(\frac{n}{m} x + 1)}^{- \frac{m + n}{2}} .$
Therefore $\begin{aligned} E [X] & = \int_{0}^{\infty} f_{X} (x) x d x \\ = \frac{Γ (\frac{m + n}{2})}{Γ (\frac{m}{2}) Γ (\frac{n}{2})} {(\frac{n}{m})}^{\frac{n}{2}} \int_{0}^{\infty} x^{\frac{n}{2}} {(\frac{n}{m} x + 1)}^{- \frac{m + n}{2}} d x \end{aligned}$
Let $t = \frac{n}{m} x$ , then the above integral is equivalent to $\begin{aligned} \frac{Γ (\frac{m + n}{2})}{Γ (\frac{m}{2}) Γ (\frac{n}{2})} {(\frac{n}{m})}^{\frac{n}{2}} \int_{0}^{\infty} {(\frac{m}{n} t)}^{\frac{n}{2}} (t + 1)^{- \frac{m + n}{2}} \frac{m}{n} d t \\ = \frac{Γ (\frac{m + n}{2})}{Γ (\frac{m}{2}) Γ (\frac{n}{2})} \cdot \frac{m}{n} \int_{0}^{\infty} t^{\frac{n}{2}} (t + 1)^{- \frac{m + n}{2}} d t \\ = \frac{Γ (\frac{m + n}{2})}{Γ (\frac{m}{2}) Γ (\frac{n}{2})} \cdot B (\frac{n}{2} + 1, \frac{m}{2} - 1) \cdot \frac{m}{n} \\ = \frac{Γ (\frac{m + n}{2})}{Γ (\frac{m}{2}) Γ (\frac{n}{2})} \cdot \frac{Γ (\frac{n}{2} + 1) Γ (\frac{m}{2} - 1)}{Γ (\frac{m + n}{2})} \cdot \frac{m}{n} \\ = \frac{n / 2}{m / 2 - 1} \cdot \frac{m}{n} \\ = \frac{m}{m - 2} . \end{aligned}$

For variation of $F$ -distribution we can use similar calculation as its expectation using the second version of Beta function.

Characteristic function ↩︎
NegativeBinomial ↩︎
HyperGeometric ↩︎

1 Preliminary

1.1 Generalized Binomial Coefficient

1.2 Newton Binomial Theorem

2 Bernoulli Trial and Bernoulli distribution

2.1 Expectation and variation of Bernoulli distribution

3 Binomial Distribution

3.1 Introduction of binomial distribution

3.2 Expectation and Variation of binomial distribution

4 Geometric Distribution

4.1 Pmf of Geometric Distribution

4.2 Expectation of Geometric Distribution

4.3 Variation of Geometric Distribution

4.4 Sum of Geometric-distributed random variables

5 Pascal Distribution

5.1 Expectation and variation of Pascal distribution

6 Negative Binomial Distribution

6.1 Expectation and variation of negative binomial distribution

7 Gaussian Distribution

7.1 Pdf of Gaussian Distribution

7.2 Mgf of Gaussian Distribution

7.3 Characteristic function of Gaussian Variables

7.4 Moments of standard Gaussian

7.5 An important property of Standard Gaussian and Mills Ratio

7.6 Expectation of absolute value of Gaussian variables

7.7 Rotational Invariance of Gaussian Variables

8 Multivariate Gaussian Distribution

8.1 Definition of Standard Normal Random Vector

8.2 Definition of Normal random vector

8.3 Joint pdf of multivariate Gaussian distribution

8.4 Characteristic function of multivariate Gaussian variables

8.5 Linear Transformation of multivariate Gaussian variables

9 Chi-squared Distribution

9.1 Definition of Chi-squared Distribution

9.2 Pdf of Chi-squared Distribution

9.3 Relation to Gamma Distribution

9.4 Expectation of Chi-squared Distribution

9.5 Variation of Chi-squared Distribution

9.6 Mgf of Chi-squared Distribution

9.7 Characteristic function of Chi-squared Distribution

9.8 Asymptotic Property of Chi-squared Distribution

9.8.1 LLN

9.8.2 CLT

10 Exponential Distribution

10.1 Pdf of Exponential Distribution

10.2 The tail probability of Exponential Distribution

10.3 Memoryless Property

10.4 Relation to Gamma Distribution

11 Poisson Distribution

11.1 Pmf of Poisson Distribution

11.2 Expectation and variance of Poisson Distribution

11.3 Mgf of Poisson Distribution

11.4 Reproducibility of Poisson distribution

11.5 Poisson approximation to the Binomial distribution

12 Gamma Distribution

12.1 Γ function

12.2 Γ(α,β) Distribution (shape-scale version)

12.3 Γ(α,β) Distribution (shape-rate version)

12.4 Expectation of Γ(α,β) Distribution

12.5 Variation of Γ(α,β) Distribution

12.6 Mgf of Γ(α,β) distribution

12.6.1 Additivity property of the Gamma Distribution

13 Inverse Gamma Distribution

13.1 Pdf of Inverse Gamma Distribution

14 Beta Distribution

14.1 Beta Function

14.2 Pdf of beta distribution

14.3 Expectation of beta distribution

14.4 Variation of beta distribution

14.5 Relation to Chi-squared Distribution

15 Cauchy Distribution

15.1 Pdf of Cauchy Distribution

15.2 Median of Cauchy Distribution

15.3 Characteristic function of Cauchy distribution

16 T Distribution

16.1 Definition

16.2 Pdf of t-distribution

17 F Distribution

17.1 Definition

17.2 Pdf of F-distribution

17.3 Property of F-distribution

12.1 $Γ$ function

12.2 $Γ (α, β)$ Distribution (shape-scale version)

12.3 $Γ (α, β)$ Distribution (shape-rate version)

12.4 Expectation of $Γ (α, β)$ Distribution

12.5 Variation of $Γ (α, β)$ Distribution

12.6 Mgf of $Γ (α, β)$ distribution

17.4 Expectation and Variation of $F$ -distribution